Adverse drug reactions (ADRs) are a common problem in clinical and pharmacovigilance research and can lead to serious patient harm and biased conclusions if modelled poorly. Sparse, noisy, and highly imbalanced drug ADR data often cause standard machine learning methods to perform no better than naïve frequency‑based approaches, unless appropriate low‑rank and kernel‑based methods are used with clear assumptions [1].
To predict adverse drug reaction (ADR) profiles by integrating chemical fingerprints and drug–gene interaction using advanced statistical modeling approaches.
The primary objectives are to,
1. Explain the ADR profile prediction problem and its statistical challenges in imbalanced, noisy health data.
2. Explore various statistical methods for ADR prediction.
Figure 1: Distribution of side effects per drug
Figure 2: Drugs with most side effects and most frequent side effects
Figure 3: Drug and side effects similarity
ADR Profile Prediction Methods Using Drug–Gene Interaction Chemical fingerprint Features:
1. Naïve Frequency Model: Predicts ADRs based solely on their observed prevalence in the dataset, serving as a baseline.
2. Kernel Regression (KR): Models the relationship between drug features and ADRs using a similarity-based kernel approach.
3. Linear SVM: Classifies ADR presence using a linear hyperplane in feature space.
4. RBF-Kernel SVM: Employs a non-linear radial basis function kernel to capture complex relationships between drug features and ADRs.
5. VKR (NMF + Kernel Ridge Regression): Combines low-rank latent factor decomposition (NMF) with kernel ridge regression to predict ADRs in sparse and imbalanced datasets.
Figure 4: Early Performance of ADR Prediction Methods
We would like to thank Dr. Yezhao Zhong, Dr. Cathal Seoighe , Dr. Haixuan Yang for their work in ADR prediction and sharing the code and data through the github page.
[1] Yezhao Zhong, Cathal Seoighe, Haixuan Yang, Non-Negative matrix factorization combined with kernel regression for the prediction of adverse drug reaction profiles, Bioinformatics Advances, Volume 4, Issue 1, 2024, vbae009
[2] Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016 Jan 4;44(D1):D1075-9.
[3] Y. -D. Kim and S. Choi, “Weighted nonnegative matrix factorization,” 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 2009, pp. 1541-1544,
The code and datasets for this project can be viewed at our GitHub repository here: https://github.com/arshad4387/ADR-Prediction.git